Graph processing applications are severely bottlenecked by memory system performance due to low data reuse and irregular memory accesses. While state-of-the-art prefetchers using Machine Learning (ML) have made great progress, they do not perform well on graph analytics applications due to phase transitions in the execution and irregular data access that is hard to predict. We propose MPGraph: a novel ML-based Prefetcher for Graph analytics. MPGraph makes three novel optimizations based on domain knowledge of graph analytics. It detects the transition of graph processing phases during execution using a novel soft detection technique, predicts memory accesses and pages using phase-specific multi-modality predictors, and prefetches using a novel chain spatio-temporal prefetching strategy. We evaluate our approach using three widely-used graph processing frameworks and a variety of graph datasets. Our approach achieves 34.17%-82.15% higher precision in phase transition detection than the KSWIN and decision tree baselines. Our predictors achieve 6.80%-16.02% higher F1-score for access prediction and 11.68%-15.41% higher accuracy-at-10 for page prediction compared with the baselines LSTM-based and vanilla attention-based models. Simulations show that MPGraph achieves on the average 87.16% (prefetch accuracy) and 73.29% (prefetch coverage), leading to 12.52%-21.23% IPC improvement. It outperforms the widely-used non-ML prefetcher BO by 7.58%-12.03%, and outperforms state-of-the-art ML-based prefetchers Voyager by 3.27%-4.42% and TransFetch by 3.73%-4.58% with respect to IPC improvement.
translated by 谷歌翻译
强化学习(RL)在机器人,游戏和医疗保健等应用领域取得了重大成功。但是,培训RL代理商非常耗时。由于CPU上的不规则内存访问和线程级同步开销等挑战,当前的实现表现出较差的性能。在这项工作中,我们提出了一种用于在多核系统上产生可扩展的强化学习实现的框架。重放缓冲区是RL算法的一个关键组件,其有助于存储从环境相互作用和用于学习过程的数据采样的样本。我们为基于$ k $ $-arty sum树定义了一个新的数据结构,用于支持异步并行插入,采样和优先级更新。为解决不规则内存访问的挑战,我们提出了一种新颖的数据布局来存储减少缓存未命中的SUCH树的节点。此外,我们提出$ \ Textit {懒惰的写入} $机制,以减少重放缓冲区操作的线程级同步开销。我们的框架采用平行演员通过环境交互和并行学习者同时收集数据,并使用收集的数据执行随机梯度下降。我们的框架支持各种强化学习算法,包括DQN,DDPG等。我们通过使用OpenAI基准对CPU + GPU平台进行实验来证明我们的框架在加速RL算法中的有效性。
translated by 谷歌翻译
在多机构强化学习中,沟通对于鼓励代理商之间的合作至关重要。由于网络条件随代理的移动性而变化,并且在传输过程中的随机性变化,因此现实无线网络中的通信可能非常不可靠。我们提出一个框架来通过解决三个基本问题来学习实用的沟通策略:(1)何时:代理商不仅基于消息重要性,而且是无线渠道条件来学习沟通时间。 (2)什么:代理增强了带有无线网络测量结果的消息内容,以更好地选择游戏和通信操作。 (3)如何:代理使用新颖的神经信息编码器来保存从接收到的消息中保留所有信息,而不管消息的数量和顺序如何。与最新的ART相比,在逼真的无线网络设置下模拟标准基准测试,我们在游戏性能,收敛速度和沟通效率方面取得了重大改进。
translated by 谷歌翻译
张量分解已成为许多数据科学应用中的重要工具。稀疏的进型张量时间Khatri-Rao产品(MTTKRP)是张量分解算法中的关键核,可将高阶现实世界大张量分解为多个矩阵。加速MTTKRP可以极大地加速张量分解过程。由于其不规则的内存访问特性,稀疏的MTTKRP是一个充满挑战的内核。由于能源效率和FPGA固有的并行性,在诸如MTTKRP等内核的现场可编程门阵列(FPGA)上实现加速器。本文探讨了在MTTKRP上设计自定义内存控制器的机会,关键挑战和方法,同时探索了这种自定义内存控制器的参数空间。
translated by 谷歌翻译
智能手机已经使用基于生物识别的验证系统,以在高度敏感的应用中提供安全性。视听生物识别技术因其可用性而受欢迎,并且由于其多式化性质,欺骗性将具有挑战性。在这项工作中,我们介绍了一个在五个不同最近智能手机中捕获的视听智能手机数据集。考虑到不同的现实情景,这个新数据集包含在三个不同的会话中捕获的103个科目。在该数据集中获取三种不同的语言,以包括扬声器识别系统的语言依赖性问题。这些数据集的这些独特的特征将为实施新的艺术技术的单向或视听扬声器识别系统提供途径。我们还报告了DataSet上的基准标记的生物识别系统的性能。生物识别算法的鲁棒性朝向具有广泛实验的重播和合成信号等信号噪声,设备,语言和呈现攻击等多种依赖性。获得的结果提出了许多关于智能手机中最先进的生物识别方法的泛化特性的担忧。
translated by 谷歌翻译
Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs. To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the "neighbor explosion" problem during minibatch training. We propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency and accuracy in a fundamentally different way. By changing perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling from the forward and backward propagation, and extend GraphSAINT with many architecture variants (e.g., graph attention, jumping connection). GraphSAINT demonstrates superior performance in both accuracy and training time on five large graphs, and achieves new state-of-the-art F1 scores for PPI (0.995) and Reddit (0.970).
translated by 谷歌翻译
Designing experiments often requires balancing between learning about the true treatment effects and earning from allocating more samples to the superior treatment. While optimal algorithms for the Multi-Armed Bandit Problem (MABP) provide allocation policies that optimally balance learning and earning, they tend to be computationally expensive. The Gittins Index (GI) is a solution to the MABP that can simultaneously attain optimality and computationally efficiency goals, and it has been recently used in experiments with Bernoulli and Gaussian rewards. For the first time, we present a modification of the GI rule that can be used in experiments with exponentially-distributed rewards. We report its performance in simulated 2- armed and 3-armed experiments. Compared to traditional non-adaptive designs, our novel GI modified design shows operating characteristics comparable in learning (e.g. statistical power) but substantially better in earning (e.g. direct benefits). This illustrates the potential that designs using a GI approach to allocate participants have to improve participant benefits, increase efficiencies, and reduce experimental costs in adaptive multi-armed experiments with exponential rewards.
translated by 谷歌翻译
Making histopathology image classifiers robust to a wide range of real-world variability is a challenging task. Here, we describe a candidate deep learning solution for the Mitosis Domain Generalization Challenge 2022 (MIDOG) to address the problem of generalization for mitosis detection in images of hematoxylin-eosin-stained histology slides under high variability (scanner, tissue type and species variability). Our approach consists in training a rotation-invariant deep learning model using aggressive data augmentation with a training set enriched with hard negative examples and automatically selected negative examples from the unlabeled part of the challenge dataset. To optimize the performance of our models, we investigated a hard negative mining regime search procedure that lead us to train our best model using a subset of image patches representing 19.6% of our training partition of the challenge dataset. Our candidate model ensemble achieved a F1-score of .697 on the final test set after automated evaluation on the challenge platform, achieving the third best overall score in the MIDOG 2022 Challenge.
translated by 谷歌翻译
Modelling and forecasting real-life human behaviour using online social media is an active endeavour of interest in politics, government, academia, and industry. Since its creation in 2006, Twitter has been proposed as a potential laboratory that could be used to gauge and predict social behaviour. During the last decade, the user base of Twitter has been growing and becoming more representative of the general population. Here we analyse this user base in the context of the 2021 Mexican Legislative Election. To do so, we use a dataset of 15 million election-related tweets in the six months preceding election day. We explore different election models that assign political preference to either the ruling parties or the opposition. We find that models using data with geographical attributes determine the results of the election with better precision and accuracy than conventional polling methods. These results demonstrate that analysis of public online data can outperform conventional polling methods, and that political analysis and general forecasting would likely benefit from incorporating such data in the immediate future. Moreover, the same Twitter dataset with geographical attributes is positively correlated with results from official census data on population and internet usage in Mexico. These findings suggest that we have reached a period in time when online activity, appropriately curated, can provide an accurate representation of offline behaviour.
translated by 谷歌翻译
Existing federated classification algorithms typically assume the local annotations at every client cover the same set of classes. In this paper, we aim to lift such an assumption and focus on a more general yet practical non-IID setting where every client can work on non-identical and even disjoint sets of classes (i.e., client-exclusive classes), and the clients have a common goal which is to build a global classification model to identify the union of these classes. Such heterogeneity in client class sets poses a new challenge: how to ensure different clients are operating in the same latent space so as to avoid the drift after aggregation? We observe that the classes can be described in natural languages (i.e., class names) and these names are typically safe to share with all parties. Thus, we formulate the classification problem as a matching process between data representations and class representations and break the classification model into a data encoder and a label encoder. We leverage the natural-language class names as the common ground to anchor the class representations in the label encoder. In each iteration, the label encoder updates the class representations and regulates the data representations through matching. We further use the updated class representations at each round to annotate data samples for locally-unaware classes according to similarity and distill knowledge to local models. Extensive experiments on four real-world datasets show that the proposed method can outperform various classical and state-of-the-art federated learning methods designed for learning with non-IID data.
translated by 谷歌翻译